### readme.txt
### replication files for Hopkins and King draft paper on automated
### content analysis

Files
A) Congressional replication

1 -- congressOS040108.R -- R code to run readme 
2 -- control10.txt -- control file; users must modify path using
"replace all" 
3 -- concomb -- folder with all Congressional speeches
4 -- congressOSOUTdd.Rdata -- output from running code; input
for graphs

B) Immigration replication

1 -- immigfinal083107.R -- R code to run readme
2 -- controlimmig.txt -- control file; users must modify path using
"replace all"
3 -- immigtexts -- folder with all immigration editorials
4 -- immigfinal083107.Rdata -- output from code; input for graphs
5 -- Coding_Instructions_2.doc -- Microsoft Word file documenting data
set codings

C) Enron replication

1 -- controlenron.txt -- control file; users must modify path using
"replace all"
2 -- enrontexts -- folder with all enron emails
3 -- enron022908.R -- R code to run readme
4 -- enronout083007.Rdata -- output from code; input for graphs

The output from A, B, and C is presented graphically by othergraphs022908.R

D) Simulated Results 

1 -- synthetic032908.R -- generates synthetic data; tests estimator;
graphs results

E) Blog Results

1 -- kerrytest030108.R -- runs estimator on Kerry data; saves output
2 -- kerry.txt.gz -- zipped file containing Kerry data formatted
as data set
3 -- bushXV030408.R -- cross-validation for Bush data set; estimates
optimal number of symptoms
4 -- optimalbush030408.R -- runs the estimator on modified Bush data;
saves output 
5 -- global.txt.gz -- zipped file providing blog data in matrix
format; n=7,530

F) RMSE simulations

1 -- rcode1000N.R -- One sample of code to be passed to condor or another parallel processor to estimate RMSE of the estimator
2 -- biasgraphs072108.R -- Compiles the results from code such as rcode1000N.R